Search CORE

19 research outputs found

Sample-efficient Learning and Generalization with Text Representations

Author: Logeswaran Lajanugen
Publication venue
Publication date: 01/01/2021
Field of study

Humans have a remarkable ability to learn without much supervision. Often, a few labelled instances or a single demonstration is enough for us to learn a new concept. Most of our knowledge is acquired in a weakly unsupervised manner, via reading, perception, and active interaction with the world. Machine learning models, on the other hand, struggle to learn from limited supervision and often need large amounts of labelled data to learn. In many practical instances, however, such supervision is not available. Furthermore, collecting labeled instances for training may be expensive or infeasible due to privacy reasons. This calls for approaches that can adapt to new tasks or new domains without needing a lot of labelled data. In this thesis, I address the limited supervision problem from two perspectives. First, I examine methods that exploit large amounts of unlabelled data to learn useful feature representations in a self-supervised manner. Such representations capture rich prior knowledge about the data, allowing them to be useful across many tasks, and enable data-efficient learning of new tasks. In particular, my work is concerned with the following key questions pertaining to text representations - (i) How do we learn representations of larger units of text, beyond words? (ii) How do we design training objectives that can efficiently learn such representations? (iii) How do we come up with representations that allow efficient knowledge transfer to downstream language understanding tasks? Second, I explore models and algorithms capable of learning from limited supervision. My work studies weakly supervised, few-shot and zero-shot learning settings with applications to text generation, sequence modeling, entity understanding and embodied control. My work demonstrates that text descriptions are an effective means of building models that generalize to new domains and new tasks without needing to experience supervised data for the new domain/task. I believe that the next generation of AI technologies will be driven by models that read and understand text to perform tasks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169634/1/llajan_1.pd

Deep Blue Documents at the University of Michigan

Generative Adversarial Text to Image Synthesis

Author: Akata Zeynep
Lee Honglak
Logeswaran Lajanugen
Reed Scott
Schiele Bernt
Yan Xinchen
Publication venue
Publication date: 01/01/2016
Field of study

Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations. Meanwhile, deep convolutional generative adversarial networks (GANs) have begun to generate highly compelling images of specific categories, such as faces, album covers, and room interiors. In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image model- ing, translating visual concepts from characters to pixels. We demonstrate the capability of our model to generate plausible images of birds and flowers from detailed text descriptions.Comment: ICML 201

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Discriminator-Guided Multi-step Reasoning with Language Models

Author: Khalifa Muhammad
Lee Honglak
Lee Moontae
Logeswaran Lajanugen
Wang Lu
Publication venue
Publication date: 24/05/2023
Field of study

In the context of multi-step reasoning, language models (LMs) probabilities are often miscalibrated -- solutions with high probabilities are not always correct. Therefore, greedy decoding, which is the standard decoding method for reasoning tasks, often yields incorrect solutions. In addition, methods such as self-consistency and verifiers rely on sampling from the LM distribution and do not tackle the underlying issue. To address this, we introduce Guiding Multi-step ReAsoning with a CorrectnEss Discriminator (GRACE), a stepwise decoding approach that nudges the model towards producing correct reasoning steps. GRACE employs a discriminator model, which is trained to differentiate correct steps from invalid ones, to adjust decoding preferences based on the correctness of each reasoning step. Importantly, GRACE does not require fine-tuning or re-training the LMs. When compared with conventional decoding strategies over four popular math reasoning benchmarks, GRACE exhibits significant improvements in both final answer accuracy and step correctness, outperforming both greedy decoding and self-consistency.\footnote{Our code can be found at \url{https://github.com/mukhal/grace.}}Comment: 19 pages, 7 figures, and 8 table

arXiv.org e-Print Archive

Exploring Demonstration Ensembling for In-context Learning

Author: Khalifa Muhammad
Lee Honglak
Lee Moontae
Logeswaran Lajanugen
Wang Lu
Publication venue
Publication date: 20/08/2023
Field of study

In-context learning (ICL) operates by showing language models (LMs) examples of input-output pairs for a given task, i.e., demonstrations. The standard approach for ICL is to prompt the LM with concatenated demonstrations followed by the test input. This approach suffers from some issues. First, concatenation offers almost no control over the contribution of each demo to the model prediction. This can be sub-optimal when some demonstrations are irrelevant to the test example. Second, due to the input length limit of some transformer models, it might be infeasible to fit many examples into the context, especially when dealing with long-input tasks. In this work, we explore Demonstration Ensembling (DENSE) as an alternative to simple concatenation. DENSE predicts outputs using subsets (i.e., buckets) of the demonstrations and then combines the output probabilities resulting from each subset to produce the final prediction. We study different ensembling methods using GPT-j and experiment on 12 language tasks. Our experiments show weighted max ensembling to outperform vanilla concatenation by as large as 2.4 average points. Code available at https://github.com/mukhal/icl-ensembling.Comment: Published at ME-FoMo workshop at ICLR 2023. Arxiv version includes evaluation on 5 more task

arXiv.org e-Print Archive

MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning

Author: Kim Dong-Ki
Lee Honglak
Logeswaran Lajanugen
Shim Dongsub
Sohn Sungryull
Publication venue
Publication date: 25/10/2023
Field of study

Recently, there has been an increasing interest in automated prompt optimization based on reinforcement learning (RL). This approach offers important advantages, such as generating interpretable prompts and being compatible with black-box foundation models. However, the substantial prompt space size poses challenges for RL-based methods, often leading to suboptimal policy convergence. This paper introduces MultiPrompter, a new framework that views prompt optimization as a cooperative game between prompters which take turns composing a prompt together. Our cooperative prompt optimization effectively reduces the problem size and helps prompters learn optimal prompts. We test our method on the text-to-image task and show its ability to generate higher-quality images than baselines

arXiv.org e-Print Archive

Merging Generated and Retrieved Knowledge for Open-Domain QA

Author: Khalifa Muhammad
Lee Honglak
Lee Moontae
Logeswaran Lajanugen
Wang Lu
Zhang Yunxiang
Publication venue
Publication date: 22/10/2023
Field of study

Open-domain question answering (QA) systems are often built with retrieval modules. However, retrieving passages from a given source is known to suffer from insufficient knowledge coverage. Alternatively, prompting large language models (LLMs) to generate contextual passages based on their parametric knowledge has been shown to improve QA performance. Yet, LLMs tend to "hallucinate" content that conflicts with the retrieved knowledge. Based on the intuition that answers supported by both sources are more likely to be correct, we propose COMBO, a Compatibility-Oriented knowledge Merging for Better Open-domain QA framework, to effectively leverage the two sources of information. Concretely, we match LLM-generated passages with retrieved counterparts into compatible pairs, based on discriminators trained with silver compatibility labels. Then a Fusion-in-Decoder-based reader model handles passage pairs to arrive at the final answer. Experiments show that COMBO outperforms competitive baselines on three out of four tested open-domain QA benchmarks. Further analysis reveals that our proposed framework demonstrates greater efficacy in scenarios with a higher degree of knowledge conflicts.Comment: EMNLP 2023 - Camera Read

arXiv.org e-Print Archive

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

Author: Cha Sungmin
Jang Joel
Lee Moontae
Logeswaran Lajanugen
Seo Minjoon
Yang Sohee
Yoon Dongkeun
Publication venue
Publication date: 04/10/2022
Field of study

Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply applying the unlikelihood training objective to target token sequences is effective at forgetting them with little to no degradation of general language modeling performances; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being orders of magnitude more computationally efficient. We release the code and dataset needed to replicate our results at https://github.com/joeljang/knowledge-unlearning

arXiv.org e-Print Archive

TOD-Flow: Modeling the Structure of Task-Oriented Dialogues

Author: Kim Dong-Ki
Lee Honglak
Liu Anthony
Logeswaran Lajanugen
Lyu Yiwei
Shim Dongsub
Sohn Sungryull
Publication venue
Publication date: 07/12/2023
Field of study

Task-Oriented Dialogue (TOD) systems have become crucial components in interactive artificial intelligence applications. While recent advances have capitalized on pre-trained language models (PLMs), they exhibit limitations regarding transparency and controllability. To address these challenges, we propose a novel approach focusing on inferring the TOD-Flow graph from dialogue data annotated with dialog acts, uncovering the underlying task structure in the form of a graph. The inferred TOD-Flow graph can be easily integrated with any dialogue model to improve its prediction performance, transparency, and controllability. Our TOD-Flow graph learns what a model can, should, and should not predict, effectively reducing the search space and providing a rationale for the model's prediction. We show that the proposed TOD-Flow graph better resembles human-annotated graphs compared to prior approaches. Furthermore, when combined with several dialogue policies and end-to-end dialogue models, we demonstrate that our approach significantly improves dialog act classification and end-to-end response generation performance in the MultiWOZ and SGD benchmarks. Code available at: https://github.com/srsohn/TOD-Flo

arXiv.org e-Print Archive